Usage of Dedicated Data Structures for URL Databases in a Large-Scale Crawling

نویسنده

  • Krzysztof Dorosz
چکیده

Within the beginning of the Internet there was always a need for an automatic browsing its resources for many purposes like: indexing, cataloguing, validating, monitoring, etc. Because of a todays large volume of the World Wide Web the term Internet is often wrongly identified with the single HTTP protocol service. The process of browsing the World Wide Web in automated manner is called crawling very likely by analogy between traversing the WWW using URL anchors1 in the HTML pages to a bug crawling. A crawling process is realised by dedicated software named crawlers or robots. Because of a large scope of possible applications, crawlers can vary with architecture,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Efficient Social Website Crawling Using Cluster Graph

Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: (1) Social websites have more complex link...

متن کامل

Efficient Social Website Crawling Using Cluster Graph ; CU-CS-1056-09

Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: (1) Social websites have more complex link...

متن کامل

A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION

In this study, an approach for damage detection of large-scale structures is developed by employing kinetic and modal strain energies and also Heuristic Particle Swarm Optimization (HPSO) algorithm. Kinetic strain energy is employed to determine the location of structural damages. After determining the suspected damage locations, the severity of damages is obtained based on variations of modal ...

متن کامل

IMPROVED BAT ALGORITHM FOR OPTIMUM DESIGN OF LARGE-SCALE TRUSS STRUCTURES

Deterring the optimum design of large-scale structures is a difficult task. Great number of design variables, largeness of the search space and controlling great number of design constraints are major preventive factors in performing optimum design of large-scale truss structures in a reasonable time. Meta-heuristic algorithms are known as one of the useful tools to d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Science (AGH)

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2009